Mapping custom
instructions for the Toshiba media embedded processor (MeP) Although processor to hardware partitioning can be successfully resolved by a
combination of designer experience, precedent, tools, such as profilers and
data-transfer analyzers …and a degree of patience and understanding, no engineer
underestimates this task. Toshiba's MeP (media embedded processor) is a case in
point.
Developed by Toshiba, the MeP is a programmable platform for creating a
system-on-chip (SoC) that is targeted at applications that require digital media
processing functionality such as video and audio. The multiple standards that
apply to digital media are constantly evolving; thus, in order to be competitive
in this dynamic environment, complicated functions need to be implemented in a
short space of time and in a platform that can efficiently reuse intellectual
property (IP). As an answer to this, the MeP is provided to users as soft IP.
The MeP IP is divided into the following categories:
Once the partitioning is has been completed and verified as being accurate,
the next challenge is to map the partitioned design onto a processor and custom
hardware architecture that consists of fixed buses. In a scenario based on
designing with the MeP in its Developers Kit, the custom hardware design is
wrapped with logic compatible to the chosen bus. The kit provides three main
types of bus: the control bus, the DSP Instruction bus, and the local bus. These
buses are very different in their protocols, and user mapped logic is
subsequently affected when mapping across them. The size, depth, and flow of
logic mapped onto these buses will affect the performance of the entire system
during transactions.
One common convention used by designers is to map the logic to fit the bus
protocol, as in the DSP instruction bus where designers map their custom
hardware as read-write instructions. For larger-sized and complex instructions,
the insertion of instruction "busy cycles" becomes necessary and results in the
processor pipeline being stalled. The processor should be free to carry on with
its miscellaneous tasks and not be hung by the processing instruction.
A clever way to design these instructions is to make the processor push and
pop calculations from the instruction pipeline, where the parameters and
operations are registered, buffered, processed, and buffered again to be
returned. Here, the designer reduces the logic depth of the hardware and
increases performance by pipelining the instructions through. The controlling
processor is freed to spend time executing other instructions and can, with
careful mapping, be interrupted once the result is available.
In the case of the Toshiba MeP processor, this is achieved by using read-only
and write-only instructions for the DSP instruction bus, thereby allowing
calculations to be piped through at the fastest rate executable by the
processor. Results are buffered and retrievable at the fastest rate possible. In
this way, the Toshiba MeP processor can execute batches of instructions as DSP
or non-DSP etc. in order to efficiently pipeline instructions through its
architecture (Fig 1).
By Hammad Hamid, Celoxica,
Courtesy of Programmable Logic DesignLine
45: 21
2005 (11:40 AM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=175007551
The processor pipelines all stages where the instruction does not raise the BUSY signal. Assuming that the DSP instruction performs the operations for 2 cycles, then (assuming 'n' is the number of pipelined DSP instructions) this results in a total of 4n+1 cycles. For example running 10 DSP instructions results in 41 cycles of operation by the hardware.
Without any BUSY cycles, the processor can pipeline the stages of the two instructions. Instead of each DSP instruction, two instructions are now executed with latency between them. For the above 10 instructions, 10 parameters are sent to the processor and 10 results are received in 20 cycles. The main point here is that the number of instructions written is never greater than the buffer size so as to avoid an overflow condition. A more optimal design combines the two instructions, knowing that coherent data is received after a known number of DSP instruction writes.
Processor-to-hardware partitioning is not a simple task, but as we have seen here, the proper tools combined with a little knowledge can make it far less daunting and, in fact, something of an enjoyable challenge.
(More information on the MeP is available from http://www.mepcore.com/; more information on the MeP Developer's Kit is available from www.celoxica.com/products/mep/default.asp).
Hammad Hamid is a Senior Design Engineer with Celixica. Hammad joined Celoxica in 1999 as a member of the company's early phase engineering team. He has worked in software tools development and applications engineering roles, and was the lead engineer developing Celoxica's hardware/software co-design technology. With a worldwide remit, Hamid is currently concentrating on projects involving custom microprocessor development and integration. He graduated from the universities of Loughborough, UK and Hull, UK with a B.Eng Aeronautical Engineering and M.Sc. Computer Graphics and Virtual Environments. Hammad can be reached at Hammad.Hamid@celoxica.com.
Copyright 2005 © CMP Media LLC